Nonlinear Estimators and Tail Bounds for Dimension Reduction in l 1 Using Cauchy Random Projections

نویسندگان

  • Ping Li
  • Trevor J. Hastie
  • Kenneth Ward Church
چکیده

For 1 dimension reduction in l1, the method of Cauchy random projections multiplies the original data matrix A ∈ R with a random matrix R ∈ R (k ≪ min(n,D)) whose entries are i.i.d. samples of the standard Cauchy C(0, 1). Because of the impossibility results, one can not hope to recover the pairwise l1 distances in A from B = AR ∈ R, using linear estimators without incurring large errors. However, nonlinear estimators are still useful for certain applications in data stream computation, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. The sample median estimator and the geometric mean estimator are asymptotically (as k → ∞) equivalent but the latter is more accurate at small k. We derive explicit tail bounds for the geometric mean estimator and establish an analog of the Johnson-Lindenstrauss (JL) lemma for dimension reduction in l1, which is weaker than the classical JL lemma for dimension reduction in l2. Asymptotically, both the sample median estimator and the geometric mean estimators are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimators and tail bounds for dimension reduction in lα (0 < α ≤ 2) using stable random projections

Abstract The method of stable random projections is popular in data stream computations, data mining, information retrieval, and machine learning, for efficiently computing the lα (0 < α ≤ 2) distances using a small (memory) space, in one pass of the data. We propose algorithms based on (1) the geometric mean estimator, for all 0 < α ≤ 2, and (2) the harmonic mean estimator, only for small α (e...

متن کامل

Reconstruction of sparse signals from l1 dimensionality-reduced Cauchy random-projections

Dimensionality reduction via linear random projections are used in numerous applications including data streaming, information retrieval, data mining, and compressive sensing (CS). While CS has traditionally relied on normal random projections, corresponding to 2 distance preservation, a large body of work has emerged for applications where 1 approximate distances may be preferred. Dimensionali...

متن کامل

Random projections of random manifolds

Interesting data often concentrate on low dimensional smooth manifolds inside a high dimensional ambient space. Random projections are a simple, powerful tool for dimensionality reduction of such data. Previous works have studied bounds on how many projections are needed to accurately preserve the geometry of these manifolds, given their intrinsic dimensionality, volume and curvature. However, ...

متن کامل

Very Sparse Stable Random Projections, Estimators and Tail Bounds for Stable Random Projections

The method of stable random projections [39, 41] is popular for data streaming computations, data mining, and machine learning. For example, in data streaming, stable random projections offer a unified, efficient, and elegant methodology for approximating the lα norm of a single data stream, or the lα distance between a pair of streams, for any 0 < α ≤ 2. [18] and [20] applied stable random pro...

متن کامل

Efficient Machine Learning Using Random Projections

As an alternative to cumbersome nonlinear schemes for dimensionality reduction, the technique of random linear projection has recently emerged as a viable alternative for storage and rudimentary processing of high-dimensional data. We invoke new theory to motivate the following claim: the random projection method may be used in conjunction with standard algorithms for a multitude of machine lea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2007